Goto

Collaborating Authors

 Alliance


Geological Inference from Textual Data using Word Embeddings

Linphrachaya, Nanmanas, Gómez-Méndez, Irving, Siripatana, Adil

arXiv.org Artificial Intelligence

This research explores the use of Natural Language Processing (NLP) techniques to locate geological resources, with a specific focus on industrial minerals. By using word embeddings trained with the GloVe model, we extract semantic relationships between target keywords and a corpus of geological texts. The text is filtered to retain only words with geographical significance, such as city names, which are then ranked by their cosine similarity to the target keyword. Dimensional reduction techniques, including Principal Component Analysis (PCA), Autoencoder, Variational Autoencoder (VAE), and VAE with Long Short-Term Memory (VAE-LSTM), are applied to enhance feature extraction and improve the accuracy of semantic relations. For benchmarking, we calculate the proximity between the ten cities most semantically related to the target keyword and identified mine locations using the haversine equation. The results demonstrate that combining NLP with dimensional reduction techniques provides meaningful insights into the spatial distribution of natural resources. Although the result shows to be in the same region as the supposed location, the accuracy has room for improvement.


Data Science Education in Undergraduate Physics: Lessons Learned from a Community of Practice

Shah, Karan, Butler, Julie, Knaub, Alexis, Zenginoğlu, Anıl, Ratcliff, William, Soltanieh-ha, Mohammad

arXiv.org Artificial Intelligence

It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from different institutions and backgrounds to share best practices and lessons learned from integrating data science into undergraduate physics education. In this article we present insights and experiences from this community of practice, highlighting key strategies and challenges in incorporating data science into the introductory physics curriculum. Our goal is to provide guidance and inspiration to educators who seek to integrate data science into their teaching, helping to prepare the next generation of physicists for a data-driven world.


Neural Approaches to Entity-Centric Information Extraction

Zaporojets, Klim

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) has huge impact on our daily lives with applications such as voice assistants, facial recognition, chatbots, autonomously driving cars, etc. Natural Language Processing (NLP) is a cross-discipline of AI and Linguistics, dedicated to study the understanding of the text. This is a very challenging area due to unstructured nature of the language, with many ambiguous and corner cases. In this thesis we address a very specific area of NLP that involves the understanding of entities (e.g., names of people, organizations, locations) in text. First, we introduce a radically different, entity-centric view of the information in text. We argue that instead of using individual mentions in text to understand their meaning, we should build applications that would work in terms of entity concepts. Next, we present a more detailed model on how the entity-centric approach can be used for the entity linking task. In our work, we show that this task can be improved by considering performing entity linking at the coreference cluster level rather than each of the mentions individually. In our next work, we further study how information from Knowledge Base entities can be integrated into text. Finally, we analyze the evolution of the entities from the evolving temporal perspective.


Machine Learning Technique Predicting Video Streaming Views to Reduce Cost of Cloud Services

Darwich, Mahmoud

arXiv.org Artificial Intelligence

Video streams tremendously occupied the highest portion of online traffic. Multiple versions of a video are created to fit the user's device specifications. In cloud storage, Keeping all versions of frequently accessed video streams in the repository for the long term imposes a significant cost paid by video streaming providers. Generally, the popularity of a video changes each period of time, which means the number of views received by a video could be dropped, thus, the video must be deleted from the repository. Therefore, in this paper, we develop a method that predicts the popularity of each video stream in the repository in the next period. On the other hand, we propose an algorithm that utilizes the predicted popularity of a video to compute the storage cost, and then it decides whether the video will be kept or deleted from the cloud repository. The experiment results show a cost reduction of the cloud services by 15% compared to keeping all video streams.


Determining Sentencing Recommendations and Patentability Using a Machine Learning Trained Expert System

Brown, Logan, Pezewski, Reid, Straub, Jeremy

arXiv.org Artificial Intelligence

This paper presents two studies that use a machine learning expert system (MLES). One focuses on a system to advise to United States federal judges for regarding consistent federal criminal sentencing, based on both the federal sentencing guidelines and offender characteristics. The other study aims to develop a system that could prospectively assist the U.S. Patent and Trademark Office automate their patentability assessment process. Both studies use a machine learning-trained rule-fact expert system network to accept input variables for training and presentation and output a scaled variable that represents the system recommendation (e.g., the sentence length or the patentability assessment). This paper presents and compares the rule-fact networks that have been developed for these projects. It explains the decision-making process underlying the structures used for both networks and the pre-processing of data that was needed and performed. It also, through comparing the two systems, discusses how different methods can be used with the MLES system.